Skip to content

Add lazy Inductor compilation to graph_pp_runner#360

Merged
xmfan merged 2 commits intomainfrom
xmfan/stack/30
Mar 17, 2026
Merged

Add lazy Inductor compilation to graph_pp_runner#360
xmfan merged 2 commits intomainfrom
xmfan/stack/30

Conversation

@xmfan
Copy link
Member

@xmfan xmfan commented Mar 10, 2026

Stacked PRs:


Add lazy Inductor compilation to graph_pp_runner

Add _execute_graph() that lazily compiles graph modules with
compile_fx_inner on first invocation. Controlled by an inductor kwarg
threaded through all run* functions.

GraphPPRunner accepts inductor=True and propagates it to all
GraphPipelineStage instances, which the stage_* action functions
read when calling run*.

Authored with Claude.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 10, 2026
@xmfan xmfan changed the base branch from xmfan/stack/29 to main March 10, 2026 21:57
@xmfan xmfan changed the base branch from main to xmfan/stack/29 March 10, 2026 21:57
@xmfan xmfan requested a review from sanketpurandare March 12, 2026 06:07
@xmfan xmfan marked this pull request as ready for review March 12, 2026 06:07
Copy link
Contributor

@sanketpurandare sanketpurandare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI is failing for this

@xmfan xmfan changed the base branch from xmfan/stack/29 to main March 16, 2026 23:41
@sanketpurandare sanketpurandare self-requested a review March 17, 2026 00:22
Copy link
Contributor

@sanketpurandare sanketpurandare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, can merge after CI passes

@xmfan xmfan force-pushed the xmfan/stack/30 branch 3 times, most recently from 5288597 to 2bac665 Compare March 17, 2026 00:57
@xmfan xmfan changed the base branch from main to xmfan/stack/32 March 17, 2026 00:57
xmfan added 2 commits March 16, 2026 20:46
The module-level `dispatcher.sharding_propagator = CustomShardingPropagator()`
was leaking into other test files (e.g. test_api.py) when run in the same
pytest process, causing `aten.copy_` failures because the custom propagator
doesn't have rules for ops that the default DTensor propagator handles.

test_dtensor.py's two test classes (ImplicitRegistrationTest, DimShardingTest)
inherit from DTensorTestBase which uses MultiProcessTestCase -- each test
spawns subprocesses that re-import the module. Those subprocesses don't run
pytest fixtures, so they need the custom propagator installed at module level.
We gate the module-level install on `multiprocessing.current_process().name`
to only run in spawned workers, and use a module-scoped autouse pytest fixture
to install/restore the propagator in the main process.

Authored with Claude.

stack-info: PR: #367, branch: xmfan/stack/32
Add _execute_graph() that lazily compiles graph modules with
compile_fx_inner on first invocation. Controlled by an inductor kwarg
threaded through all _run_* functions.

GraphPPRunner accepts inductor=True and propagates it to all
GraphPipelineStage instances, which the stage_* action functions
read when calling _run_*.

Authored with Claude.

stack-info: PR: #360, branch: xmfan/stack/30
@xmfan xmfan changed the base branch from xmfan/stack/32 to main March 17, 2026 03:49
@xmfan xmfan changed the base branch from main to xmfan/stack/32 March 17, 2026 03:49
Base automatically changed from xmfan/stack/32 to main March 17, 2026 14:28
@xmfan xmfan merged commit f3295b3 into main Mar 17, 2026
10 checks passed
xmfan added a commit that referenced this pull request Mar 17, 2026
* Scope CustomShardingPropagator to test_dtensor tests via pytest fixture

The module-level `dispatcher.sharding_propagator = CustomShardingPropagator()`
was leaking into other test files (e.g. test_api.py) when run in the same
pytest process, causing `aten.copy_` failures because the custom propagator
doesn't have rules for ops that the default DTensor propagator handles.

test_dtensor.py's two test classes (ImplicitRegistrationTest, DimShardingTest)
inherit from DTensorTestBase which uses MultiProcessTestCase -- each test
spawns subprocesses that re-import the module. Those subprocesses don't run
pytest fixtures, so they need the custom propagator installed at module level.
We gate the module-level install on `multiprocessing.current_process().name`
to only run in spawned workers, and use a module-scoped autouse pytest fixture
to install/restore the propagator in the main process.

Authored with Claude.

stack-info: PR: #367, branch: xmfan/stack/32

* Add lazy Inductor compilation to graph_pp_runner

Add _execute_graph() that lazily compiles graph modules with
compile_fx_inner on first invocation. Controlled by an inductor kwarg
threaded through all _run_* functions.

GraphPPRunner accepts inductor=True and propagates it to all
GraphPipelineStage instances, which the stage_* action functions
read when calling _run_*.

Authored with Claude.

stack-info: PR: #360, branch: xmfan/stack/30

* Add --inductor flag to example_ds3_pp with FORCE_BALANCED_ROUTING

The DSv3 MoE implementation uses .tolist() and data-dependent grouped_mm
offsets that Inductor cannot compile. Add FORCE_BALANCED_ROUTING to the
model that makes token-per-expert counts uniform and uses balanced
all-to-all splits, eliminating all data-dependent ops.

The --inductor CLI flag enables both Inductor compilation and forced
balanced routing together.

Authored with Claude.

stack-info: PR: #361, branch: xmfan/stack/31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants